Vox Populi: Collecting High-Quality Labels from a Crowd

نویسندگان

  • Ofer Dekel
  • Ohad Shamir
چکیده

With the emergence of search engines and crowdsourcing websites, machine learning practitioners are faced with datasets that are labeled by a large heterogeneous set of teachers. These datasets test the limits of our existing learning theory, which largely assumes that data is sampled i.i.d. from a fixed distribution. In many cases, the number of teachers actually scales with the number of examples, with each teacher providing just a handful of labels, precluding any statistically reliable assessment of an individual teacher’s quality. In this paper, we study the problem of pruning low-quality teachers in a crowd, in order to improve the label quality of our training set. Despite the hurdles mentioned above, we show that this is in fact achievable with a simple and efficient algorithm, which does not require that each example be repeatedly labeled by multiple teachers. We provide a theoretical analysis of our algorithm and back our findings with empirical evidence.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Vox Populi: An Interactive Evolutionary System for Algorithmic Music Composition

While recent techniques of digital sound synthesis have put numerous new sounds on the musician’s desktop, several artificial-intelligence (AI) techniques have also been applied to algorithmic composition. This article introduces Vox Populi, a system based on evolutionary computation techniques for composing music in real time. In Vox Populi, a population of chords codified according to MIDI pr...

متن کامل

The Wisdom of Crowds (Vox Populi) and Antidepressant Use

Under certain conditions, groups of people may (collectively) make better judgments than experts. Galton connected this phenomenon to the phrase vox populi in a 1907 paper. Arguably, an example of the phenomenon may be found in recent stabilization of the frequency of antidepressant use, following decades of increases. There is no evidence that a change in physi-cian behaviour has caused this s...

متن کامل

VOX POPULI: Automatic Generation of Biased Video Sequences

We describe our experimental rhetoric engine Vox Populi that generates biased video-sequences from a repository of video interviews and other related audio-visual web sources. Users are thus able to explore their own opinions on controversial topics covered by the repository. The repository contains interviews with United States residents stating their opinion on the events occurring after the ...

متن کامل

A new approach to relevancy in Internet searching - the "Vox Populi Algorithm"

In this paper we will derive a new algorithm for Internet searching. The main idea of this algorithm is to extend the existing algorithms by a component, which reflects the interests of the users more than existing methods. The “Vox Populi Algorithm” (VPA) [1] creates a feedback from the users to the content of the search index. The information derived from the users query analysis is used to m...

متن کامل

Medical marijuana, compassionate use, and public policy: Expert opinion or vox populi?

Spend your few moment to read a book even only few pages. Reading book is not obligation and force for everybody. When you don't want to read, you can get punishment from the publisher. Read a book becomes a choice of your different characteristics. Many people with reading habit will always be enjoyable to read, or on the contrary. For some reasons, this medical marijuana compassionate use and...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009